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Abstract 

It  is  now  well  understood  that  (1)  it  is  possible  to  reconstruct  sparse  signals  exactly  from 
what  appear  to  be  highly  incomplete  sets  of  linear  measurements  and  (2)  that  this  can  be 
done  by  constrained  t\  minimization.  In  this  paper,  we  study  a  novel  method  for  sparse  signal 
recovery  that  in  many  situations  outperforms  t\  minimization  in  the  sense  that  substantially 
fewer  measurements  are  needed  for  exact  recovery.  The  algorithm  consists  of  solving  a  sequence 
of  weighted  t\ -minimization  problems  where  the  weights  used  for  the  next  iteration  are  computed 
from  the  value  of  the  current  solution.  We  present  a  series  of  experiments  demonstrating  the 
remarkable  performance  and  broad  applicability  of  this  algorithm  in  the  areas  of  sparse  signal 
recovery,  statistical  estimation,  error  correction  and  image  processing.  Interestingly,  superior 
gains  are  also  achieved  when  our  method  is  applied  to  recover  signals  with  assumed  near-sparsity 
in  overcomplete  representations — not  by  reweighting  the  t\  norm  of  the  coefficient  sequence  as  is 
common,  but  by  reweighting  the  norm  of  the  transformed  object.  An  immediate  consequence 
is  the  possibility  of  highly  efficient  data  acquisition  protocols  by  improving  on  a  technique  known 
as  Compressive  Sensing. 

Keywords.  U -minimization,  iterative  reweighting,  underdetermined  systems  of  linear  equa¬ 
tions,  Compressive  Sensing,  the  Dantzig  selector,  sparsity,  FOCUSS. 

1  Introduction 

What  makes  some  scientific  or  engineering  problems  at  once  interesting  and  challenging  is  that 
often,  one  has  fewer  equations  than  unknowns.  When  the  equations  are  linear,  one  would  like  to 
determine  an  object  x$  G  R"  from  data  y  =  4>,to,  where  4*  is  an  m  x  n  matrix  with  fewer  rows  than 
columns;  i.e.,  m  <  n.  The  problem  is  of  course  that  a  system  with  fewer  equations  than  unknowns 
usually  has  infinitely  many  solutions  and  thus,  it  is  apparently  impossible  to  identify  which  of  these 
candidate  solutions  is  indeed  the  “correct”  one  without  some  additional  information. 

In  many  instances,  however,  the  object  we  wish  to  recover  is  known  to  be  structured  in  the 
sense  that  it  is  sparse  or  compressible.  This  means  that  the  unknown  object  depends  upon  a 
smaller  number  of  unknown  parameters.  In  a  biological  experiment,  one  could  measure  changes  of 
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expression  in  30,000  genes  and  expect  at  most  a  couple  hundred  genes  with  a  different  expression 
level.  In  signal  processing,  one  could  sample  or  sense  signals  which  are  known  to  be  sparse  (or 
approximately  so)  when  expressed  in  the  correct  basis.  This  premise  radically  changes  the  problem, 
making  the  search  for  solutions  feasible  since  the  simplest  solution  now  tends  to  be  the  right  one. 

Mathematically  speaking  and  under  sparsity  assumptions,  one  would  want  to  recover  a  sig¬ 
nal  xq  £  R",  e.g.,  the  coefficient  sequence  of  the  signal  in  the  appropriate  basis,  by  solving  the 
combinatorial  optimization  problem 

(Po)  min  1 1 a? 1 1 £0  subject  to  y  =  $>x,  (1) 

where  ||x||^0  =  |{z  :  xt  ^  0} | .  This  is  a  common  sense  approach  which  simply  seeks  the  simplest 
explanation  fitting  the  data.  In  fact,  this  method  can  recover  sparse  solutions  even  in  situations  in 

which  m  <C  n.  Suppose  for  example  that  all  sets  of  m  columns  of  are  in  general  position.  Then 

the  program  (Po)  perfectly  recovers  all  sparse  signals  xo  obeying  ||xo|U0  <  m/2.  This  is  of  little 
practical  use,  however,  since  the  optimization  problem  (1)  is  nonconvex  and  generally  impossible 
to  solve  as  its  solution  usually  requires  an  intractable  combinatorial  search. 

A  common  alternative  is  to  consider  the  convex  problem 

(Pi)  min  ||xL,  subject  to  y  =  <l>x,  (2) 

zeR71 

where  \\x\\^  =  )T)”=]  |xj|.  Unlike  (Po),  this  problem  is  convex — it  can  actually  be  recast  as  a  linear 
program — and  is  solved  efficiently  [1],  The  programs  (Po)  and  (Pi)  differ  only  in  the  choice  of 
objective  function,  with  the  latter  using  an  t\  norm  as  a  proxy  for  the  literal  sparsity  count.  As 
summarized  below,  a  recent  body  of  work  has  shown  that  perhaps  surprisingly,  there  are  conditions 
guaranteeing  a  formal  equivalence  between  the  combinatorial  problem  (Po)  and  its  relaxation  (Pi). 

The  use  of  the  i\  norm  as  a  sparsity-promoting  functional  traces  back  several  decades.  A  lead¬ 
ing  early  application  was  reflection  seismology,  in  which  a  sparse  reflection  function  (indicating 
meaningful  changes  between  subsurface  layers)  was  sought  from  bandlimited  data.  In  1979,  Taylor, 
Banks  and  McCoy  [2]  proposed  the  use  of  t\  to  deconvolve  seismic  traces  by  improving  on  earlier 
ideas  of  Claerbout  and  Muir  [3].  Over  the  next  decade  this  idea  was  refined  to  better  handle  obser¬ 
vation  noise  [4],  and  the  sparsity-promoting  nature  of  i\  minimization  was  empirically  confirmed. 
Rigorous  results  began  to  appear  in  the  late-1980’s,  with  Donoho  and  Stark  [5]  and  Donoho  and 
Logan  [6]  quantifying  the  ability  to  recover  sparse  reflectivity  functions.  The  application  areas  for 
t\  minimization  began  to  broaden  in  the  mid-1990’s,  as  the  LASSO  algorithm  [7]  was  proposed 
as  a  method  in  statistics  for  sparse  model  selection,  Basis  Pursuit  [8]  was  proposed  in  compu¬ 
tational  harmonic  analysis  for  extracting  a  sparse  signal  representation  from  highly  overcomplete 
dictionaries,  and  a  related  technique  known  as  total  variation  minimization  was  proposed  in  image 
processing  [9,10]. 

Some  examples  of  t\  type  methods  for  sparse  design  in  engineering  include  Vandenberghe  et 
al.  [11,  12]  for  designing  sparse  interconnect  wiring,  and  Hassibi  et  al.  [13]  for  designing  sparse 
control  system  feedback  gains.  In  [14],  Dahleh  and  Diaz-Bobillo  solve  controller  synthesis  problems 
with  an  i\  criterion,  and  observe  that  the  optimal  closed-loop  responses  are  sparse.  Lobo  et 
al.  used  t\  techniques  to  find  sparse  trades  in  portfolio  optimization  with  fixed  transaction  costs 
in  [15].  In  [16],  Ghosh  and  Boyd  used  t\  methods  to  design  well  connected  sparse  graphs;  in  [17], 
Sun  et  al.  observe  that  optimizing  the  rates  of  a  Markov  process  on  a  graph  leads  to  sparsity. 
In  [1,  §6.5.4,  §11.4.1],  Boyd  and  Vandenberghe  describe  several  problems  involving  t\  methods  for 
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sparse  solutions,  including  finding  small  subsets  of  mutually  infeasible  inequalities,  and  points  that 
violate  few  constraints.  In  a  recent  paper,  Koh  et  al.  used  these  ideas  to  carry  out  piecewise-linear 
trend  analysis  [18]. 

Over  the  last  decade,  the  applications  and  understanding  of  t\  minimization  have  continued  to 
increase  dramatically.  Donoho  and  Huo  [19]  provided  a  more  rigorous  analysis  of  Basis  Pursuit,  and 
this  work  was  extended  and  refined  in  subsequent  years,  see  [20-22].  Much  of  the  recent  focus  on 
£i  minimization,  however,  has  come  in  the  emerging  field  of  Compressive  Sensing  [23-25].  This  is  a 
setting  where  one  wishes  to  recover  a  signal  xq  from  a  small  number  of  compressive  measurements 
y  =  $xo-  It  has  been  shown  that  t\  minimization  allows  recovery  of  sparse  signals  from  remarkably 
few  measurements  [26,27]:  supposing  <f>  is  chosen  randomly  from  a  suitable  distribution,  then  with 
very  high  probability,  all  sparse  signals  xo  for  which  ||xo|k0  is  m/a  with  a  =  0(log(n/m))  can  be 
perfectly  recovered  by  using  (Pi).  Moreover,  it  has  been  established  [27]  that  Compressive  Sensing 
is  robust  in  the  sense  that  t\  minimization  can  deal  very  effectively  (a)  with  only  approximately 
sparse  signals  and  (b)  with  measurement  noise.  The  implications  of  these  facts  are  quite  far- 
reaching,  with  potential  applications  in  data  compression  [24,28],  digital  photography  [29],  medical 
imaging  [23,30],  error  correction  [31,32],  analog-to-digital  conversion  [33],  sensor  networks  [34,35], 
and  so  on.  (We  will  touch  on  some  more  concrete  examples  in  Section  3.) 

The  use  of  l\  regularization  has  become  so  widespread  that  it  could  arguably  be  considered  the 
“modern  least  squares” .  This  raises  the  question  of  whether  we  can  improve  upon  t\  minimization? 
It  is  natural  to  ask,  for  example,  whether  a  different  (but  perhaps  again  convex)  alternative  to  l§ 
minimization  might  also  find  the  correct  solution,  but  with  a  lower  measurement  requirement  than 
t\  minimization. 

In  this  paper,  we  consider  one  such  alternative,  which  aims  to  help  rectify  a  key  difference 
between  the  t\  and  norms,  namely,  the  dependence  on  magnitude:  larger  coefficients  are  penalized 
more  heavily  in  the  t\  norm  than  smaller  coefficients,  unlike  the  more  democratic  penalization  of  the 
norm.  To  address  this  imbalance,  we  propose  a  weighted  formulation  of  t\  minimization  designed 
to  more  democratically  penalize  nonzero  coefficients.  In  Section  2,  we  discuss  an  iterative  algorithm 
for  constructing  the  appropriate  weights,  in  which  each  iteration  of  the  algorithm  solves  a  convex 
optimization  problem,  whereas  the  overall  algorithm  does  not.  Instead,  this  iterative  algorithm 
attempts  to  find  a  local  minimum  of  a  concave  penalty  function  that  more  closely  resembles  the 
£q  norm.  Finally,  we  would  like  to  draw  attention  to  the  fact  that  each  iteration  of  this  algorithm 
simply  requires  solving  one  t \  minimization  problem,  and  so  the  method  can  be  implemented  readily 
using  existing  software. 

In  Section  3,  we  present  a  series  of  experiments  demonstrating  the  superior  performance  and 
broad  applicability  of  this  algorithm,  not  only  for  recovery  of  sparse  signals,  but  also  pertaining 
to  compressible  signals,  noisy  measurements,  error  correction,  and  image  processing.  This  section 
doubles  as  a  brief  tour  of  the  applications  of  Compressive  Sensing.  In  Section  4,  we  demonstrate 
the  promise  of  this  method  for  efficient  data  acquisition.  Finally,  we  conclude  in  Section  5  with  a 
final  discussion  of  related  work  and  future  directions. 
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2  An  iterative  algorithm  for  reweighted  l\  minimization 

2.1  Weighted  t\  minimization 

Consider  the  “weighted”  t\  minimization  problem 

(WPi)  min  )  wAxA  subject  to  y  =  <l>x,  (3) 

irGR"  z— J 

2=1 

where  W\,W2, ...  ,wn  are  positive  weights.  Just  like  its  “unweighted”  counterpart  (Pi),  this  convex 
problem  can  be  recast  as  a  linear  program.  In  the  sequel,  it  will  be  convenient  to  denote  the 
objective  functional  by  ||VPa;||£1  where  W  is  the  diagonal  matrix  with  w\,.. .  ,wn  on  the  diagonal 
and  zeros  elsewhere. 

The  weighted  i\  minimization  (WPi)  can  be  viewed  as  a  relaxation  of  a  weighted  £q  minimization 
problem 

(WPo)  min  ||VPx|hn  subject  to  y  =  <bx.  (4) 

ieRn 

Whenever  the  solution  to  (Po)  is  unique,  it  is  also  the  unique  solution  to  (WPo)  provided  that 
the  weights  do  not  vanish.  However,  the  corresponding  t\  relaxations  (Pi)  and  (WPi)  will  have 
different  solutions  in  general.  Hence,  one  may  think  of  the  weights  (wi)  as  free  parameters  in  the 
convex  relaxation,  whose  values — if  set  wisely — could  improve  the  signal  reconstruction. 

This  raises  the  immediate  question:  what  values  for  the  weights  will  improve  signal  reconstruc¬ 
tion?  One  possible  use  for  the  weights  could  be  to  counteract  the  influence  of  the  signal  magnitude 
on  the  l\  penalty  function.  Suppose,  for  example,  that  the  weights  were  inversely  proportional  to 
the  true  signal  magnitude,  i.e.,  that 


Wi  =  {  Xo’^0’ 
I  oo,  x0,i  =  0. 


If  the  true  signal  xo  is  fc-sparse,  i.e.,  obeys  1 1 mo  | U0  <  k,  then  (WPi)  is  guaranteed  to  find  the  correct 
solution  with  this  choice  of  weights,  assuming  only  that  m  >  k  and  that  just  as  before,  the  columns 
of  4*  are  in  general  position.  The  large  (actually  infinite)  entries  in  Wi  force  the  solution  x  to 
concentrate  on  the  indices  where  Wi  is  small  (actually  finite),  and  by  construction  these  correspond 
precisely  to  the  indices  where  xo  is  nonzero.  It  is  of  course  impossible  to  construct  the  precise 
weights  (5)  without  knowing  the  signal  xo  itself,  but  this  suggests  more  generally  that  large  weights 
could  be  used  to  discourage  nonzero  entries  in  the  recovered  signal,  while  small  weights  could  be 
used  to  encourage  nonzero  entries. 

For  the  sake  of  illustration,  consider  the  simple  3-D  example  in  Figure  1,  where  xq  =  [(1  1  0]2 

and 


$  = 


2  1  1 
112' 


We  wish  to  recover  xo  from  y  =  Txo  =  [1  1]T ■  Figure  1(a)  shows  the  original  signal  x'o,  the  set 
of  points  x  G  M3  obeying  4>x  =  <3?xo  =  y,  and  the  i\  ball  of  radius  1  centered  at  the  origin.  The 
interior  of  the  t\  ball  intersects  the  feasible  set  =  y ,  and  thus  (Pi)  finds  an  incorrect  solution, 
namely,  x *  =  [1/3  0  1/3] T  /  xo  (see  Figure  1(b)). 

Consider  now  a  hypothetical  weighting  matrix  W  =  diag([3  1  3]T).  Figure  1(c)  shows  the 
“weighted  l\  ball”  of  radius  HlFx]^  =  1  centered  at  the  origin.  Compared  to  the  unweighted  i\ 


4 


&x=y 


0x=y 


0x=y 


(a) 


(b) 


(c) 


Figure  1:  Weighting  £\  minimization  to  improve  sparse  signal  recovery,  (a)  Sparse  signal 
xq,  feasible  set  4>x  =  y,  and  l\  ball  of  radius  ||xo||ei-  (b)  There  exists  an  x  ^  Xq  for  which 
H^IUi  <  II^oIUj -  (c)  Weighted  t\  ball.  There  exists  no  x  ^  xq  for  which  HWeH^  <  ||IUxo|| Mi¬ 


hail  (Figure  1(a)),  this  ball  has  been  sharply  pinched  at  xq.  As  a  result,  the  interior  of  the  weighted 
i\  ball  does  not  intersect  the  feasible  set,  and  consequently,  (WPi)  will  find  the  correct  solution 
x *  =  xq.  Indeed,  it  is  not  difficult  to  show  that  the  same  statements  would  hold  true  for  any 
positive  weighting  matrix  for  which  W2  <  {w i  +  ws)/3.  Hence  there  is  a  range  of  valid  weights  for 
which  (WPi)  will  find  the  correct  solution.  As  a  rough  rule  of  thumb,  the  weights  should  relate 
inversely  to  the  true  signal  magnitudes. 


2.2  An  iterative  algorithm 

The  question  remains  of  how  a  valid  set  of  weights  may  be  obtained  without  first  knowing  xq.  As 
Figure  1  shows,  there  may  exist  a  range  of  favorable  weighting  matrices  W  for  each  fixed  xq,  which 
suggests  the  possibility  of  constructing  a  favorable  set  of  weights  based  solely  on  an  approximation 
x  to  xo  or  on  other  side  information  about  the  vector  magnitudes. 

We  propose  a  simple  iterative  algorithm  that  alternates  between  estimating  xo  and  redefining 
the  weights.  The  algorithm  is  as  follows: 

1.  Set  the  iteration  count  £  to  zero  and  =  1,  i  =  1, . . .  ,n. 

2.  Solve  the  weighted  £\  minimization  problem 

=  argmin  HhF^xll^  subject  to  y  =  <hx.  (6) 


3.  Update  the  weights:  for  each  i  =  1, . . . ,  n, 

,(<+i) 


w] 


x. 


Wi 


(7) 


4.  Terminate  on  convergence  or  when  £  attains  a  specified  maximum  number  of  iterations  £max. 
Otherwise,  increment  £  and  go  to  step  2. 
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We  introduce  the  parameter  e  >  0  in  step  3  in  order  to  provide  stability  and  to  ensure  that  a 
zero-valued  component  in  does  not  strictly  prohibit  a  nonzero  estimate  at  the  next  step.  As 
empirically  demonstrated  in  Section  3,  e  should  be  set  slightly  smaller  than  the  expected  nonzero 
magnitudes  of  xq.  In  general,  the  recovery  process  tends  to  be  reasonably  robust  to  the  choice  of  e. 

Using  an  iterative  algorithm  to  construct  the  weights  (wi)  tends  to  allow  for  successively  better 
estimation  of  the  nonzero  coefficient  locations.  Even  though  the  early  iterations  may  find  inaccurate 
signal  estimates,  the  largest  signal  coefficients  are  most  likely  to  be  identified  as  nonzero.  Once 
these  locations  are  identified,  their  influence  is  downweighted  in  order  to  allow  more  sensitivity  for 
identifying  the  remaining  small  but  nonzero  signal  coefficients. 

Figure  2  illustrates  this  dynamic  by  means  of  an  example  in  sparse  signal  recovery.  Figure  2(a) 
shows  the  original  signal  of  length  n  =  512,  which  contains  130  nonzero  spikes.  We  collect  m  =  256 
measurements  where  the  matrix  $  has  independent  standard  normal  entries.  We  set  e  =  0.1 
and  lma.v  =  2.  Figures  2(b)-(d)  show  scatter  plots,  coefficient-by-coefficient,  of  the  original  signal 
coefficient  xq  versus  its  reconstruction  In  the  unweighted  iteration  (Figure  2(b)),  we  see 

that  all  large  coefficients  in  xq  are  properly  identified  as  nonzero  (with  the  correct  sign),  and  that 
||xo  —  =  0.4857.  In  this  first  iteration,  ||x(0)||^0  =  256  =  to,  with  15  nonzero  spikes  in 

xq  reconstructed  as  zeros  and  141  zeros  in  xo  reconstructed  as  nonzeros.  These  numbers  improve 
after  one  reweighted  iteration  (Figure  2(c))  with  now  ||x  —  x^]]^  =  0.2407,  ||a;(1)||^0  =  256  = 
m,  6  nonzero  spikes  in  xq  reconstructed  as  zeros  and  132  zeros  in  xq  reconstructed  as  nonzeros. 
This  improved  signal  estimate  is  then  sufficient  to  allow  perfect  recovery  in  the  second  reweighted 
iteration  (Figure  2(d)). 


2.3  Analytical  justification 

The  iterative  reweighted  algorithm  falls  in  the  general  class  of  Majorization-Minimization  (MM) 
algorithms,  see  [36]  and  references  therein.  In  a  nutshell,  MM  algorithms  are  more  general  than 
EM  algorithms,  and  work  by  iteratively  minimizing  a  simple  surrogate  function  majorizing  a  given 
objective  function.  To  establish  this  connection,  consider  the  problem 


n 

min  )  log(|xd  +  e)  subject  to  y  =  4>x, 
2=1 


which  is  equivalent  to 


n 

min  log(uj  +  e)  subject  to 
i=  1 


y  =  <frx, 

|Xj|  <  Ui,  i  =  1, ...  ,71. 


(8) 

(9) 


The  equivalence  means  that  if  x *  is  a  solution  to  (8),  then  (x*,  |x*|)  is  a  solution  to  (9).  And 
conversely,  if  (x*,u*)  is  a  solution  to  (9),  then  x *  is  a  solution  to  (8). 

Problem  (9)  is  of  the  general  form 


min  g(v)  subject  to  v  £  C, 

V 

where  C  is  a  convex  set.  In  (9),  the  function  g  is  concave  and,  therefore,  below  its  tangent.  Thus, 
one  can  improve  on  a  guess  v  at  the  solution  by  minimizing  a  linearization  of  g  around  v.  This 
simple  observation  yields  the  following  MM  algorithm:  starting  with  £  C,  inductively  define 

u(£+i)  =  argrnin  g(v ^)  +  V g(v^)  ■  (v  —  v ^)  subject  to  v  E  C. 
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(a) 


(b) 


x 


0 


X 


0 


(c) 


(d) 


Figure  2:  Sparse  signal  recovery  through  reweighted  i\  iterations,  (a)  Original  length  n  = 
512  signal  Xq  with  130  spikes,  (b)  Scatter  plot ,  coefhcient-by-coefhcient,  of  xq  versus  its 
reconstruction  using  unweighted  t\  minimization,  (c)  Reconstruction  after  the  first 
reweighted  iteration,  (d)  Reconstruction  x ^  after  the  second  reweighted  iteration. 
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Each  iterate  is  now  the  solution  to  a  convex  optimization  problem.  In  the  case  (9)  of  interest,  this 
gives 


{x^+1\u^+1^)  =  argmin  ^  > 

i=  1  «,•  +  e 

which  is  of  course  equivalent  to 

n  i  - 

x(f:+l>  =  argmin 


subject  to  f=5X'  .  , 

\Xi\  <Ui,  1  =  1  , . . .  ,n, 


m,  . 

i= i  I  x\  I  +e 


subject  to  y  =  <3?x. 


One  now  recognizes  our  iterative  algorithm. 

In  a  Ph.D.  thesis  [37]  and  two  subsequent  papers  [38,39],  Fazel  et  al.  have  considered  the  same 
reweighted  i\  minimization  algorithm  as  in  Section  2.2,  first  as  a  heuristic  algorithm  for  applications 
in  portfolio  optimization  [38],  and  second  as  a  special  case  of  an  iterative  algorithm  for  minimizing 
the  rank  of  a  matrix  subject  to  convex  constraints  [39].  Using  general  theory,  they  argue  that 
SILi  l°g(|a^|  +  e)  converges  to  a  local  minimum  of  g(x)  =  X^=i  l°g(|^i|  +  e)  (note  that  this  not 
saying  that  the  sequence  (x^)  converges).  Because  the  log-sum  penalty  function  is  concave,  one 
cannot  expect  this  algorithm  to  always  find  a  global  minimum.  As  a  result,  it  is  important  to 
choose  a  suitable  starting  point  for  the  algorithm.  Like  [39],  we  have  suggested  initializing  with  the 
solution  to  (Pi),  the  unweighted  l\  minimization.  In  practice  we  have  found  this  to  be  an  effective 
strategy.  Further  connections  between  our  work  and  FOCUSS  strategies  are  discussed  at  the  end 
of  the  paper. 

The  connection  with  the  log-sum  penalty  function  provides  a  basis  for  understanding  why 
reweighted  t\  minimization  can  improve  the  recovery  of  sparse  signals.  In  particular,  the  log- 
sum  penalty  function  has  the  potential  to  be  much  more  sparsity-encouraging  than  the  i\  norm. 
Consider,  for  example,  three  potential  penalty  functions  for  scalar  magnitudes  t : 


fo(t)  =  Ift^op  fi(t)  =  |i|,  and  /iog,e(t)  oc  log(l  +  |t|/e), 

where  the  constant  of  proportionality  is  set  such  that  /iog,e(l)  =  1  =  /o(l)  =  /i(l),  see  Figure  3. 
The  first  (To-like)  penalty  function  /o  has  infinite  slope  at  t  =  0,  while  its  convex  (7) -like)  relaxation 
/i  has  unit  slope  at  the  origin.  The  concave  penalty  function  /iog,e,  however,  has  slope  at  the  origin 
that  grows  roughly  as  1/e  when  e  — >  0.  Like  the  (.q  norm,  this  allows  a  relatively  large  penalty  to 
be  placed  on  small  nonzero  coefficients  and  more  strongly  encourages  them  to  be  set  to  zero.  In 
fact,  f\0g,e(t)  tends  to  fo(t)  as  e  — >  0.  Following  this  argument,  it  would  appear  that  e  should  be  set 
arbitrarily  small,  to  most  closely  make  the  log-sum  penalty  resemble  the  l§  norm.  Unfortunately, 
as  e  — >  0,  it  becomes  more  likely  that  the  iterative  reweighted  i\  algorithm  will  get  stuck  in  an 
undesirable  local  minimum.  As  shown  in  Section  3,  a  cautious  choice  of  e  (slightly  smaller  than 
the  expected  nonzero  magnitudes  of  x )  provides  the  stability  necessary  to  correct  for  inaccurate 
coefficient  estimates  while  still  improving  upon  the  unweighted  t\  algorithm  for  sparse  recovery. 


2.4  Variations 

One  could  imagine  a  variety  of  possible  reweighting  functions  in  place  of  (7).  We  have  experimented 
with  alternatives,  including  a  binary  (large/small)  setting  of  Wi  depending  on  the  current  guess. 
Though  such  alternatives  occasionally  provide  superior  reconstruction  of  sparse  signals,  we  have 
found  the  rule  (7)  to  perform  well  in  a  variety  of  experiments  and  applications. 


Figure  3:  At  the  origin ,  the  canonical  £q  sparsity  count  fo(t)  is  better  approximated  by  the 
log-sum  penalty  function  /iog,e(f)  than  by  the  traditional  convex  relaxation  fi(t). 


Alternatively,  one  can  attempt  to  minimize  a  concave  function  other  than  the  log-sum  penalty. 
For  instance,  we  may  consider 

n 

g(x)  =  ^atan(|xj|/e) 

2=1 

in  lieu  of  YH=i  log(l  +  \xi\/e)-  The  function  atan  is  bounded  above  and  £o-like.  If  x  is  the  current 
guess,  this  proposal  updates  the  sequence  of  weights  as  Wi  =  l/(x2  +  e2).  There  are  of  course  many 
possibilities  of  this  nature  and  they  tend  to  work  well  (sometimes  better  than  the  log-sum  penalty). 
As  one  example,  Figueiredo  et  al.  [40,  41]  have  proposed  an  MM  algorithm  for  solving  penalized 
likelihood  signal  restoration  problems  of  the  form 

min  \\y  —  &x\\%  +  A||x||£,  0  <  p  <  1. 

Using  an  l\  penalty  to  majorize  the  nonconvex  iv  penalty,  the  resulting  algorithm  can  be  interpreted 
as  a  sequence  of  weighted  l\  minimization  problems,  with  each  set  of  weights  depending  on  the 
previous  iteration.  As  a  second  example,  Wipf  and  Nagarajan  [42]  have  proposed  an  iterative 
reweighted  t\  minimization  algorithm  for  a  feature  selection  problem  known  as  automatic  relevance 
determination  (ARD).  In  this  case,  it  is  also  shown  that  ARD  is  equivalent  to  solving  a  MAP 
estimation  problem  of  the  form 

nrin  ||y  —  &x\\%+g(x), 

x£M.n 

where  g{x)  is  a  concave,  non-decreasing  function  that  is  generally  not  factorable  into  individual 
components  Xj.  Zou  [43]  has  also  proposed  an  iterative  reweighted  version  of  the  LASSO  algorithm, 
while  Zou  and  Li  [44]  have  examined  connections  with  a  penalty  similar  to  the  log-sum.  Because 
of  space  limitations,  however,  we  will  limit  ourselves  to  empirical  studies  of  the  performance  of  the 
log-sum  penalty. 

2.5  Historical  progression 

The  development  of  the  reweighted  l\  algorithm  has  an  interesting  historical  parallel  with  the  use 
of  Iteratively  Reweighted  Least  Squares  (IRLS)  for  robust  statistical  estimation  [45-47].  Consider 
a  regression  problem  Ax  =  b  where  the  observation  matrix  A  is  overdetermined.  It  was  noticed  that 
standard  least  squares  regression,  in  which  one  minimizes  ||r||2  where  r  =  Ax  —  b  is  the  residual 
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vector,  lacked  robustness  vis  a  vis  outliers.  To  defend  against  this,  IRLS  was  proposed  as  an 
iterative  method  to  minimize  instead  the  objective 

min  ^  p(n(x)), 
i 

where  /?(•)  is  a  penalty  function  such  as  the  t\  norm  [45,48].  This  minimization  can  be  accomplished 
by  solving  a  sequence  of  weighted  least-squares  problems  where  the  weights  {rcj}  depend  on  the 
previous  residual  Wi  =  p'(ri)/ri.  For  typical  choices  of  p  this  dependence  is  in  fact  inversely 
proportional — large  residuals  will  be  penalized  less  in  the  subsequent  iteration  and  vice  versa — 
as  is  the  case  with  our  reweighted  i\  algorithm.  Interestingly,  just  as  IRLS  involved  iteratively 
reweighting  the  l^-norm  in  order  to  better  approximate  an  td-like  criterion,  our  algorithm  involves 
iteratively  reweighting  the  t \  -norm  in  order  to  better  approximate  an  I'o-like  criterion. 

3  Numerical  experiments 

We  present  a  series  of  experiments  demonstrating  the  benefits  of  reweighting  the  i\  penalty.  We 
will  see  that  the  requisite  number  of  measurements  to  recover  or  approximate  a  signal  is  typi¬ 
cally  reduced,  in  some  cases  by  a  substantial  amount.  We  also  demonstrate  that  the  reweighting 
approach  is  robust  and  broadly  applicable,  providing  examples  of  sparse  and  compressible  signal 
recovery,  noise-aware  recovery,  model  selection,  error  correction,  and  2-dimensional  total-variation 
minimization.  Meanwhile,  we  address  important  issues  such  as  how  one  can  choose  e  wisely  and 
how  robust  is  the  algorithm  to  this  choice,  and  how  many  reweighting  iterations  are  needed  for 
convergence. 

Throughout  Sections  3  and  4,  to  implement  Step  2  of  each  reweighted  algorithm,  we  use  the 
existing  solver  in  the  td-MAGIC  software  package  (available  at  www  .11-magic  .  org)  or  make  straight¬ 
forward  modifications  therein.  For  example,  the  optimization  problem  (6)  can  be  posed  equivalently 
as 

z^  =  argmin  \\z\\^  subject  to  y  =  ^(IF^)^1^, 

which  can  be  fed  directly  into  td-MAGIC;  one  must  then  let  the  signal  estimate  . 

The  algorithms  in  td-MAGIC  are  based  on  standard  primal-dual  and  log-barrier  methods  [1]  for 
solving  linear  programs  (LPs)  and  second-order  cone  programs  (SOCPs),  respectively.  To  give 
the  reader  a  sense  of  the  added  computational  complexity  over  an  unweighted  implementation, 
we  indicate  the  number  £max  of  reweighting  iterations  below — that  is,  the  total  number  of  calls  to 
Step  2  of  the  reweighted  algorithm.  The  computational  complexity  of  any  individual  call  to  Step  2 
depends  on  the  complexity  of  solving  a  single  LP  or  SOCP,  and  for  the  type  of  problems  common 
in  Compressive  Sensing,  the  structure  of  the  operator  often  facilitates  fast  solutions  using  on  the 
order  of  lOOOn  log  n  operations.  Other  fast  solvers  abound  that  could  also  be  adapted  to  reweighting 
(e.g.,  [49]);  in  general,  the  development  and  theory  of  fast  algorithms  for  tj -minimization  remains 
a  topic  of  active  research. 

3.1  Sparse  signal  recovery 

The  purpose  of  this  first  experiment  is  to  demonstrate  (1)  that  reweighting  reduces  the  necessary 
sampling  rate  for  sparse  signals  (2)  that  this  recovery  is  robust  with  respect  to  the  choice  of  e 
and  (3)  that  few  reweighting  iterations  are  typically  needed  in  practice.  The  setup  for  each  trial 
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4  Reweighting  Iterations 


Reweighting,  e  =  0.1 


Figure  4:  Sparse  signal  recovery  from  m  =  100  random  measurements  of  a  length  n  =  256 
signal.  The  probability  of  successful  recovery  depends  on  the  sparsity  level  k.  The  dashed 
curves  represent  a  reweighted  l\  algorithm  that  outperforms  the  traditional  unweighted  t\ 
approach  (solid  curve),  (a)  Performance  after  4  reweighting  iterations  as  a  function  of  e.  (b) 
Performance  with  fixed  e  =  0.1  as  a  function  of  the  number  of  reweighting  iterations. 


is  as  follows.  We  select  a  sparse  signal  xq  of  length  n  =  256  with  ||xolk0  =  k.  The  k  nonzero 
spike  positions  are  chosen  randomly,  and  the  nonzero  values  are  chosen  randomly  from  a  zero-mean 
unit-variance  Gaussian  distribution.  We  set  m  =  100  and  sample  a  random  m  x  n  matrix  with 
i.i.d.  Gaussian  entries,  giving  the  data  y  =  ‘bxo-  To  recover  the  signal,  we  run  several  reweighting 
iterations  with  equality  constraints  (see  Section  2.2).  The  parameter  e  remains  fixed  during  these 
iterations.  Finally,  we  run  500  trials  for  various  fixed  combinations  of  k  and  e. 

Figure  4(a)  compares  the  performance  of  unweighted  t\  to  reweighted  l\  for  various  values  of 
the  parameter  e.  The  solid  line  plots  the  probability  of  perfect  signal  recovery  (declared  when 
||®0  —  A\ioo  —  10”3)  for  the  unweighted  t\  algorithm  as  a  function  of  the  sparsity  level  k.  The 
dashed  curves  represent  the  performance  after  4  reweighted  iterations  for  several  different  values  of 
the  parameter  e.  We  see  a  marked  improvement  over  the  unweighted  i\  algorithm;  with  the  proper 
choice  of  e,  the  requisite  oversampling  factor  m/k  for  perfect  signal  recovery  has  dropped  from 
approximately  100/25  =  4  to  approximately  100/33  ~  3.  This  improvement  is  also  fairly  robust 
with  respect  to  the  choice  of  e,  with  a  suitable  rule  being  about  10%  of  the  standard  deviation  of 
the  nonzero  signal  coefficients. 

Figure  4(b)  shows  the  performance,  with  a  fixed  value  of  e  =  0.1,  of  the  reweighting  algorithm 
for  various  numbers  of  reweighted  iterations.  We  see  that  much  of  the  benefit  comes  from  the  first 
few  reweighting  iterations,  and  so  the  added  computational  cost  for  improved  signal  recovery  is 
quite  moderate. 

3.2  Sparse  and  compressible  signal  recovery  with  adaptive  choice  of  e 

We  would  like  to  confirm  the  benefits  of  reweighted  t\  minimization  for  compressible  signal  recovery 
and  consider  the  situation  when  the  parameter  e  is  not  provided  in  advance  and  must  be  estimated 
during  reconstruction.  We  propose  an  experiment  in  which  each  trial  is  designed  as  follows.  We 
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sample  a  signal  of  length  n  =  256  from  one  of  three  types  of  distribution:  (1)  k- sparse  with 
i.i.d.  Gaussian  entries,  (2)  fc-sparse  with  i.i.d.  symmetric  Bernoulli  ±1  entries,  or  (3)  compressible, 
constructed  by  randomly  permuting  the  sequence  {z~1//p}™=1  for  a  fixed  p,  applying  random  sign 
flips,  and  normalizing  so  that  H^olltoc  =  1;  that  is,  we  construct  the  ordered  sequence  {i~1/,p}”=1, 
multiply  each  entry  by  a  random  sign  and  then  apply  a  random  permutation  to  the  signed  sequence. 
The  output  is  xq.  We  set  m  =  128  and  sample  a  random  m  x  n  matrix  <f>  with  i.i.d.  Gaussian 
entries.  To  recover  the  signal,  we  again  solve  a  reweighted  t\  minimization  with  equality  constraints 
y  =  <hxo  =  <I>x.  In  this  case,  however,  we  adapt  e  at  each  iteration  as  a  function  of  the  current 
guess  x*% ;  step  3  of  the  algorithm  is  modified  as  follows: 

3.  Let  (|x|(,))  denote  a  reordering  of  (|xj|)  in  decreasing  order  of  magnitude.  Set 

e  =  max  j|xW|(io),  l(T3j  , 

where  iq  =  m/[41og(n/m)j.  Define  w^+1^  as  in  (7). 

Our  motivation  for  choosing  this  value  for  e  is  based  on  the  anticipated  accuracy  of  i\  minimization 
for  arbitrary  signal  recovery.  In  general,  the  reconstruction  quality  afforded  by  t\  minimization  is 
comparable  (approximately)  to  the  best  io-term  approximation  to  xq,  and  so  we  expect  approxi¬ 
mately  this  many  signal  components  to  be  approximately  correct.  Put  differently,  we  cannot  really 
expect  to  recover  the  components  of  the  signal  whose  absolute  value  is  below  |x^)|(io).  This  moti¬ 
vates  a  choice  of  e  on  the  order  of  Suppose,  however,  that  the  recovered  signal  is  sparser 

than  this  and  has  fewer  than  %  nonzero  terms.  Then  e  would  be  zero  and  some  of  the  weights 
would  be  infinite.  To  prevent  this  from  occurring,  we  put  a  lower  cap  on  e  and  thus  an  upper  limit 
on  the  value  of  the  weights.  The  choice  e  >  10~3  may  appear  a  little  arbitrary  but  works  well  in 
practice. 

We  run  100  trials  of  the  above  experiment  for  each  signal  type.  The  results  for  the  fc-sparse 
experiments  are  shown  in  Figure  5(a).  The  solid  black  line  indicates  the  performance  of  unweighted 
t\  recovery  (success  is  declared  when  ||xo  —  xW^  <  10~3).  This  curve  is  the  same  for  both  the 
Gaussian  and  Bernoulli  coefficients,  as  the  success  or  failure  of  unweighted  t\  minimization  depends 
only  on  the  support  and  sign  pattern  of  the  original  sparse  signal.  The  dashed  curves  indicate  the 
performance  of  reweighted  i\  minimization  for  Gaussian  coefficients  (blue  curve)  and  Bernoulli 
coefficients  (red  curve)  with  £max  =  4.  We  see  a  substantial  improvement  for  recovering  sparse 
signals  with  Gaussian  coefficients,  yet  we  see  only  very  slight  improvement  for  recovering  sparse 
signals  with  Bernoulli  coefficients.  This  discrepancy  likely  occurs  because  the  decay  in  the  sparse 
Gaussian  coefficients  allows  large  coefficients  to  be  easily  identified  and  significantly  downweighted 
early  in  the  reweighting  algorithm.  With  Bernoulli  coefficients  there  is  no  such  “low-hanging  fruit” . 

The  results  for  compressible  signals  are  shown  in  Figure  5(b), (c).  Each  plot  represents  a  his¬ 
togram,  over  100  trials,  of  the  i 2  reconstruction  error  improvement  afforded  by  reweighting,  namely, 
||xo  —  \\i2/\\xq  —  x^  1 1 £2  •  We  see  the  greatest  improvements  for  smaller  p  corresponding  to  sparser 
signals,  with  reductions  in  i 2  reconstruction  error  up  to  50%  or  more.  As  p  — >  1,  the  improvements 
diminish. 

3.3  Recovery  from  noisy  measurements 

Reweighting  can  be  applied  to  a  noise-aware  version  of  t\  minimization,  further  improving  the 
recovery  of  signals  from  noisy  data.  We  observe  y  =  d>x’o  +  2,  where  2:  is  a  noise  term  which  is  either 


12 


(a)  (b)  (c) 


Figure  5:  ( a )  Improvements  in  sparse  signal  recovery  from  reweighted  i\  minimization  when 
compared  to  unweighted  t\  minimization  (solid  black  curve).  The  dashed  blue  curve  corre¬ 
sponds  to  sparse  signals  with  Gaussian  coefficients;  the  dashed  red  curve  corresponds  to  sparse 
signals  with  Bernoulli  coefficients.  (b),(c)  Improvements  in  compressible  signal  recovery  from 
reweiglited  t\  minimization  when  compared  to  unweighted  t\  minimization;  signal  coefficients 
decay  as  with  (b)  p  =  0.4  and  (c)  p  =  0.7.  Histograms  indicate  the  £ 2  reconstruction 

error  improvements  afforded  by  the  reweighted  algorithm. 


stochastic  or  deterministic.  To  recover  xo,  we  adapt  quadratically-constrained  £\  minimization  [7, 
27],  and  modify  step  2  of  the  reweighted  l\  algorithm  with  equality  constraints  (see  Section  2.2)  as 

x^  =  argmin  ||W^x||4  subject  to  \\y  —  <Fx||^2  <  <5.  (10) 

The  parameter  5  is  adjusted  so  that  the  true  vector  xo  be  feasible  (resp.  feasible  with  high  proba¬ 
bility)  for  (10)  in  the  case  where  z  is  deterministic  (resp.  stochastic). 

To  demonstrate  how  this  proposal  improves  on  plain  £\  minimization,  we  sample  a  vector  of 
length  n  =  256  from  one  of  three  types  of  distribution:  (1)  ^-sparse  with  k  =  38  and  i.i.d.  Gaussian 
entries,  (2)  k- sparse  with  k  =  38  and  i.i.d.  symmetric  Bernoulli  ±1  entries,  or  (3)  compressible, 
constructed  by  randomly  permuting  the  sequence  {z-1/p}f=1  for  a  fixed  p,  applying  random  sign 
flips,  and  normalizing  so  that  Hxoll^  =  1.  The  matrix  is  128  x  256  with  i.i.d.  Gaussian  entries 
whose  columns  are  subsequently  normalized,  and  the  noise  vector  z  is  drawn  from  an  i.i.d.  Gaussian 
zero-mean  distribution  properly  rescaled  so  that  \\z\\e2  =  /?||$x||^2  with  (3  =  0.2;  i.e.,  2  =  ozq  where 
zo  is  standard  white  noise  and  o  =  /3 1 1  <J>x  1 1  £2  / 1 1  1 1  •  The  parameter  <5  is  set  to  5 2  =  u2(m  +  2\/2 m) 

as  this  provides  a  likely  upper  bound  on  ||z||£2.  In  details,  the  probability  that  \\z\\g  exceeds  52  is  the 
probability  that  a  chi  square  with  m  degrees  of  freedom  exceeds  its  mean  by  at  least  two  standard 
deviations.  This  quantity  is  about  2.5%  when  m  is  not  too  small  thanks  to  the  approximation  given 
by  the  central  limit  theorem.  We  set  e  to  be  the  empirical  maximum  value  of  || over  several 
realizations  of  a  random  vector  £  ~  jV(0,  cr2Im );  for  the  values  m  =  128  and  /3  =  0.2  we  have  e  ~  1. 
(This  gives  a  rough  estimate  for  the  noise  amplitude  in  the  signal  domain,  and  hence,  a  baseline 
above  which  significant  signal  components  could  be  identified.) 

We  run  100  trials  for  each  signal  type.  Figure  6  shows  histograms  of  the  1 2  reconstruction  error 
improvement  afforded  by  9  iterations,  i.e.,  each  histogram  documents  ||xo  —  |U2/||xo  — 

over  100  trials.  We  see  in  these  experiments  that  the  reweighted  quadratically-constrained  l\ 
minimization  typically  offers  improvements  ||xo  —  x^9)  ||^2 / ||xo  ~  x^0)  ||^2  in  the  range  0.5  —  1  in  many 
examples.  The  results  for  sparse  Gaussian  spikes  are  slightly  better  than  for  sparse  Bernoulli  spikes, 
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Gaussian  coeffs,  k  =  38 


0.4  0.6  0.8  1 


(a) 


Bernoulli  coeffs,  k  =  38  Compressible  signal,  p=0.7 


(b)  (c) 


Figure  6:  Sparse  and  compressible  signal  reconstruction  from  noisy  measurements.  His¬ 
tograms  indicate  the  £2  reconstruction  error  improvements  afforded  by  the  reweighted 
quadratically-constrained  i\  minimization  for  various  signal  types. 


though  both  are  generally  favorable.  Similar  behavior  holds  for  compressible  signals,  and  we  have 
observed  that  smaller  values  of  p  (sparser  signals)  allow  the  most  improvement. 

3.4  Statistical  estimation 

Reweighting  also  enhances  statistical  estimation  as  well.  Suppose  we  observe  y  =  <3?xo  +  z,  where 
is  m  x  n  with  m  <  n,  and  2  is  a  noise  vector  2  ~  A7(0,  rr2/m )  drawn  from  an  i.i.d.  Gaussian 
zero-mean  distribution,  say.  To  estimate  xo,  we  adapt  the  Dantzig  selector  [50]  and  modify  step  2 
of  the  reweighted  l\  algorithm  as 

x ^  =  argmin  HW^xll^  subject  to  ||3>*(y  —  4>x)||^oo  <  5.  (11) 

Again  5  is  a  parameter  making  sure  that  the  true  unknown  vector  is  feasible  with  high  probability. 

To  judge  this  proposal,  we  consider  a  sequence  of  experiments  in  which  xq  is  of  length  n  =  256 
with  k  =  8  nonzero  entries  in  random  positions.  The  nonzero  entries  of  xq  have  i.i.d.  entries  ac¬ 
cording  to  the  model  x,  =  Sj(l  +  |aj|)  where  the  sign  Sj  =  ±1  with  probability  1/2  and  a*  ~  jV(0, 1). 
The  matrix  4>  is  72  x  256  with  i.i.d.  Gaussian  entries  whose  columns  are  subsequently  normalized 
just  as  before.  The  noise  vector  (2*)  has  i.i.d.  jV(0,  <r2 )  components  with  o  =  1  /Zyjk/m  ~  0.11. 
The  parameter  <5  is  set  to  be  the  empirical  maximum  value  of  ||4'*2||^00  over  several  realizations  of 
a  random  vector  2  rs_/  We  set  e  =  0.1. 

After  each  iteration  of  the  reweighted  Dantzig  selector,  we  also  refine  our  estimate  using 
the  Gauss-Dantzig  technique  to  correct  for  a  systematic  bias  [50].  Let  I  =  {i  :  |x^|  >  a  ■  <7}  with 
a  =  1/4.  Then  one  substitutes  with  the  least  squares  estimate  which  solves 

min  ||y  —  4>x||^2  subject  to  Xi  =  0,  i  ^  /; 

that  is,  by  regressing  y  onto  the  subset  of  columns  indexed  by  I. 

We  first  report  on  one  trial  with  £mSLX  =  4.  Figure  7(a)  shows  the  original  signal  xo  along  with  the 
recovery  using  the  first  (unweighted)  Dantzig  selector  iteration;  the  error  is  ||xo  —  x^  ||^2  =  1.46. 
Figure  7(b)  shows  the  Dantzig  selector  recovery  after  4  iterations;  the  error  has  decreased  to 
||xq  —  x(4)||^2  =  1.25.  Figure  7(c)  shows  the  Gauss-Dantzig  estimate  a/°)  obtained  from  the  first 
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Unweighted 

Gauss-Dantzig 

Reweighted 

Gauss-Dantzig 

Median  error  ratio  p2 

2.43 

1.21 

Mean  error  ratio  p2 

6.12 

5.63 

Avg.  false  positives 

3.25 

0.50 

Avg.  correct  detections 

7.86 

7.80 

Table  1:  Model  selection  results  for  unweighted  and  reweighted  versions  of  the  Gauss-Dantzig 
estimator.  In  each  of  5000  trials  the  true  sparse  model  contains  k  =  8  nonzero  entries. 


(unweighted)  Dantzig  selector  iteration;  this  decreases  the  error  to  ||x’o  —  x^\\£2  =  0.57.  The 
estimator  correctly  includes  all  8  positions  at  which  xq  is  nonzero,  but  also  incorrectly  includes  4 
positions  at  which  xq  should  be  zero.  In  Figure  7(d)  we  see,  however,  that  all  of  these  mistakes 
are  rectified  in  the  Gauss-Dantzig  estimate  x obtained  from  the  reweighted  Dantzig  selector;  the 
total  error  also  decreases  to  ||xo  —  x^  \\^2  =  0.29. 

We  repeat  the  above  experiment  across  5000  trials.  Figure  8  shows  a  histogram  of  the  ratio  p 2 
between  the  squared  error  loss  of  some  estimate  x  and  the  ideal  squared  error 

2  ,=  EIUOu  -  x04)'2 

Er=imin04 

for  both  the  unweighted  and  reweighted  Gauss-Dantzig  estimators.  (The  results  are  also  summa¬ 
rized  in  Table  1.)  For  an  interpretation  of  the  denominator,  the  ideal  squared  error  min(xg  ^  a2) 
is  roughly  the  mean-squared  error  one  could  achieve  if  one  had  available  an  oracle  supplying  perfect 
information  about  which  coordinates  of  xq  are  nonzero,  and  which  are  actually  worth  estimating. 
We  see  again  a  significant  reduction  in  reconstruction  error;  the  median  value  of  p 2  decreases  from 
2.43  to  1.21.  As  pointed  out,  a  primary  reason  for  this  improvement  comes  from  a  more  accurate 
identification  of  significant  coefficients:  on  average  the  unweighted  Gauss-Dantzig  estimator  in¬ 
cludes  3.2  “false  positives,”  while  the  reweighted  Gauss-Dantzig  estimator  includes  only  0.5.  Both 
algorithms  correctly  include  all  8  nonzero  positions  in  a  large  majority  of  trials. 
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Figure  7:  Reweighting  the  Dantzig  selector.  Blue  asterisks  indicate  the  original  signal  Xq; 
red  circles  indicate  the  recovered  estimate,  (a)  Unweighted  Dantzig  selector,  (b)  Reweighted 
Dantzig  selector,  (c)  Unweighted  Gauss-Dantzig  estimate,  (d)  Reweighted  Gauss-Dantzig 
estimate. 
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Figure  8:  Histogram  of  the  ratio  p 2  between  the  squared  error  loss  and  the  ideal  squared 
error  for  (a)  unweighted  Gauss-Dantzig  estimator  and  (b)  reweighted  G auss-D ant zig  estimator. 
Approximately  5%  of  the  tail  of  each  histogram  has  been  truncated  for  display;  across  5000 
trials  the  maximum  value  observed  was  p 2  ss  165. 

3.5  Error  correction 

Suppose  we  wish  to  transmit  a  real- valued  signal  xo  £  a  block  of  n  pieces  of  information,  to 
a  remote  receiver.  The  vector  xo  is  arbitrary  and  in  particular,  nonsparse.  The  difficulty  is  that 
errors  occur  upon  transmission  so  that  a  fraction  of  the  transmitted  codeword  may  be  corrupted 
in  a  completely  arbitrary  and  unknown  fashion.  In  this  setup,  the  authors  in  [31]  showed  that 
one  could  transmit  n  pieces  of  information  reliably  by  encoding  the  information  as  <f>xo  where 
<J>  E  Mmxn,  m  >  n,  is  a  suitable  coding  matrix,  and  by  solving 

min  \\v~  $x||*i  (12) 

l£l" 

upon  receiving  the  corrupted  codeword  y  =  <hxo  +  e;  here,  e  is  the  unknown  but  sparse  corruption 
pattern.  The  conclusion  of  [31]  is  then  that  the  solution  to  this  program  recovers  xo  exactly  provided 
that  the  fraction  of  errors  is  not  too  large.  Continuing  on  our  theme,  one  can  also  enhance  the 
performance  of  this  error-correction  strategy,  further  increasing  the  number  of  corrupted  entries 
that  can  be  overcome. 

Select  a  vector  of  length  n  =  128  with  elements  drawn  from  a  zero- mean  unit- variance  Gaussian 
distribution,  and  sample  an  mxn  coding  matrix  <h  with  i.i.d.  Gaussian  entries  yielding  the  codeword 
<Fx.  For  this  experiment,  m  =  4n  =  512,  and  k  random  entries  of  the  codeword  are  corrupted  with 
a  sign  flip.  For  the  recovery,  we  simply  use  a  reweighted  version  of  (12).  Our  algorithm  is  as  follows: 

1.  Set  l  =  0  and  vaf1  =  1  for  i  =  1, 2, . . . ,  m. 

2.  Solve  the  weighted  l\  minimization  problem 

xW  =argmin||  W^\y  —  ^x)\\^.  (13) 
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Figure  9:  Unweighted  (solid  curve)  and  reweighted  (dashed  curve)  l\  signal  recovery  from 
corrupted  measurements  y  =  $£0  +  e.  The  signal  Xq  has  length  n  =  128,  the  codeword  y  has 
size  m  =  4n  =  512,  and  the  number  of  corrupted  entries  ||e||^0  =  k. 


3.  Update  the  weights;  let  =  y  —  and  for  each  i  =  1, . . . ,  m,  define 


(m) 

i 


1 


+  e 


(14) 


4.  Terminate  on  convergence  or  when  f  attains  a  specified  maximum  number  of  iterations  £max. 
Otherwise,  increment  l  and  go  to  step  2. 

We  set  e  to  be  some  factor  /3  times  the  standard  deviation  of  the  corrupted  codeword  y.  We  run 
100  trials  for  several  values  of  /3  and  of  the  size  k  of  the  corruption  pattern. 

Figure  9  shows  the  probability  of  perfect  signal  recovery  (declared  when  ||.to  —  <  10-3) 

for  both  the  unweighted  decoding  algorithm  and  the  reweighted  versions  for  various  values  of  (5 
(with  =  4).  Across  a  wide  range  of  values  (3  (and  hence  e),  we  see  that  reweighting  increases 
the  number  of  corrupted  entries  (as  a  percentage  of  the  codeword  size  m)  that  can  be  overcome, 
from  approximately  28%  to  35%. 


3.6  Total  variation  minimization  for  sparse  image  gradients 

In  a  different  direction,  reweighting  can  also  boost  the  performance  of  total-variation  (TV)  min¬ 
imization  for  recovering  images  with  sparse  gradients.  Recall  the  total-variation  norm  of  a  2- 
dimensional  array  (x'jj),  1  <  i,j  <  n,  defined  as  the  t\  norm  of  the  magnitudes  of  the  discrete 
gradient, 

IMItv  =  X! 

where  (Dx)ij  is  the  2-dimensional  vector  of  forward  differences  ( Dx)ij  =  (xj+i,j  —  Xij,  Xij+i 
when  1  <  i,j  <  n  —  1  while  at  the  boundary,  ( Dx)nj  =  (0,xnj+i  —  xnj )  for  1  <  j  <  n  —  1, 
(Dx)i)U  =  (xj+i,n  —  %i,n,  0)  for  1  <  i  <  n  —  1,  and  ( Dx)n^n  =  (0,0).  Because  many  natural  images 
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have  a  sparse  or  nearly  sparse  gradient,  it  makes  sense  to  search  for  the  reconstruction  with  minimal 
TV  norm,  i.e., 

min  1 1 cc 1 1 tv  subject  to  y  =  4>x;  (15) 

see  [9,  10],  for  example.  It  turns  out  that  this  problem  can  be  recast  as  a  second-order  cone 
program  [51],  and  thus  solved  efficiently.  The  insight  here  is  that  the  total  variation  of  x  is  equal 
to  the  optimal  value  of  the  optimization  problem 

min  ^2  Uij  subject  to  \\(Dx)ij\\  <  Uij, 
l<i 

with  optimization  variables  {uij}i<ij<n.  When  searching  for  an  array  x  minimizing  the  total 
variation  under  linear  equality  constraints,  we  introduce  the  slack  variables  Uij  and  minimize 
Ylijui,j  under  the  constraints  ||(.Dx)ij||  <  u^j  and  the  linear  equalities  for  x;  the  constraints 
||  (.Dxjijjl  <  uhj  are  second-order  cone  constraints. 

We  adapt  (15)  by  minimizing  a  sequence  of  weighted  TV  norms  as  follows: 

1.  Set  t  =  0  and  =  1,  1  <  i,j  <  n. 

2.  Solve  the  weighted  TV  minimization  problem 

x^=argmin  wfj  \\(Dx)i)j\\  subject  to  y  =  <Fx. 

l<ij<n 


3.  Update  the  weights;  for  each  1  <  i.j  <  n, 


w. 


(<+i) 


ll(Oi<,))« 


(16) 


4.  Terminate  on  convergence  or  when  t  attains  a  specified  maximum  number  of  iterations  £max. 

Otherwise,  increment  t  and  go  to  step  2. 

Naturally,  this  iterative  algorithm  corresponds  to  minimizing  a  sequence  of  linearizations  of  the 
log-sum  function  J2i<ij<n  log(||(-Dx)jj||  +  e)  around  the  previous  signal  estimate. 

To  show  how  this  can  enhance  the  performance  of  the  recovery,  consider  the  following  exper¬ 
iment.  Our  test  image  is  the  Shepp-Logan  phantom  of  size  n  =  256  x  256  (see  Figure  10(a)). 
The  pixels  take  values  between  0  and  1,  and  the  image  has  a  nonzero  gradient  at  2184  pixels.  We 
measure  y  by  sampling  the  discrete  Fourier  transform  of  the  phantom  along  10  pseudo-radial  lines 
(see  Figure  10(b)).  That  is,  y  =  4>xo,  where  represents  a  subset  of  the  Fourier  coefficients  of  xq. 
In  total,  we  take  m  =  2521  real-valued  measurements. 

Figure  10(c)  shows  the  result  of  the  classical  TV  minimization,  which  gives  a  relative  error 
equal  to  ||xo  —  H^/H^oll^  ~  0.43.  As  shown  in  Figure  10(d),  however,  we  see  a  substantial 
improvement  after  6  iterations  of  reweighted  TV  minimization  (we  used  0.1  for  the  value  of  e).  The 
recovery  is  near-perfect,  with  a  relative  error  obeying  ||xo  —  x(6) ||^2/||xo||^2  ~  2  x  10-3. 

For  point  of  comparison  it  takes  approximately  17  radial  lines  (m  =  4257  real-valued  measure¬ 
ments)  to  perfectly  recover  the  phantom  using  unweighted  TV  minimization.  Hence,  with  respect  to 
the  sparsity  of  the  image  gradient,  we  have  reduced  the  requisite  oversampling  factor  significantly, 
from  ~  1.95  down  to  ||||  ~  1.15.  It  is  worth  noting  that  comparable  reconstruction  perfor¬ 
mance  on  the  phantom  image  has  also  been  recently  achieved  by  directly  minimizing  a  nonconvex 
norm,  p  <  1,  of  the  image  gradient  [52];  we  discuss  this  approach  further  in  Section  5.1. 
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Figure  10:  Image  recovery  from  reweighted  TV  minimization,  (a)  Original  256  x  256  phantom 
image,  (b)  Fourier- domain  sampling  pattern,  (c)  Minimum-TV  reconstruction;  total  variation 
=  1336.  (d)  Reweighted  TV  reconstruction;  total  variation  (unweighted)  =  1464. 


4  Reweighted  t\  analysis 

In  many  problems,  a  signal  may  assume  sparsity  in  a  possibly  overcomplete  representation.  To 
make  things  concrete,  suppose  we  are  given  a  dictionary  T  of  waveforms  (ifj)jeJ  (tlue  columns 
of  T)  which  allows  representing  any  signal  as  i  =  Ta.  The  representation  a  is  deemed  sparse 
when  the  vector  of  coefficients  a  has  comparably  few  significant  terms.  In  some  applications,  it 
may  be  natural  to  choose  T  as  an  orthonormal  basis  but  in  others,  a  sparse  representation  of  the 
signal  x  may  only  become  possible  when  T  is  a  redundant  dictionary;  that  is,  it  has  more  columns 
than  rows.  A  good  example  is  provided  by  an  audio  signal  which  often  is  sparsely  represented  as 
a  superposition  of  waveforms  of  the  general  shape  a ~1^2g((t  —  to)/a)elult,  where  to,  uj,  and  a  are 
discrete  shift,  modulation  and  scale  parameters. 

In  this  setting,  the  common  approach  for  sparsity-based  recovery  from  linear  measurements 
goes  by  the  name  of  Basis  Pursuit  [8]  and  is  of  the  form 

min  HaH^  subject  to  y  =  (17) 

that  is,  we  seek  a  sparse  set  of  coefficients  a  that  synthesize  the  signal  x  =  'fa.  We  call  this 
synthesis-based  t\  recovery.  A  far  less  common  approach,  however,  seeks  a  signal  x  whose  coefficients 
a  =  'l!*x  (when  x  is  analyzed  in  the  dictionary  T)  are  sparse 

min  ||T*x||£1  subject  to  y  =  <Fx.  (18) 

We  call  this  analysis-based  t\  recovery.  When  T  is  an  orthonormal  basis,  these  two  programs  are 
identical,  but  in  general  they  find  different  solutions.  When  T  is  redundant,  (18)  involves  fewer 
unknowns  than  (17)  and  may  be  computationally  simpler  to  solve  [53].  Moreover,  in  some  cases 
the  analysis-based  reconstruction  may  in  fact  be  superior,  a  phenomenon  which  is  not  very  well 
understood;  see  [54]  for  some  insights. 

Both  programs  are  amenable  to  reweighting  but  what  is  interesting  is  the  combination  of 
analysis-based  i\  recovery  and  iterative  reweighting  which  seems  especially  powerful.  This  section 
provides  two  typical  examples.  For  completeness,  the  iterative  reweighted  ^-analysis  algorithm  is 
as  follows: 

1.  Set  £  =  0  and  vrp  =  1,  j  E  .7  (J  indexes  the  dictionary). 
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2.  Solve  the  weighted  t\  minimization  problem 

=  argrnin  || W^^xW^  subject  to  y  =  4>x. 


3.  Put  =  4/*xW  and  define 


w 


(<+i) 


1“}  l+e 


j  e  J. 


4.  Terminate  on  convergence  or  when  £  attains  a  specified  maximum  number  of  iterations  £n 
Otherwise,  increment  £  and  go  to  step  2. 

Unless  specified  otherwise,  we  will  use  this  algorithm  in  the  remainder  of  this  section. 


4.1  Incoherent  sampling  of  radar  pulses 

Our  first  example  is  motivated  by  our  own  research  focused  on  advancing  devices  for  analog- 
to-digital  conversion  of  high-bandwidth  signals.  To  cut  a  long  story  short,  standard  analog-to- 
digital  converter  (ADC)  technology  implements  the  usual  quantized  Shannon  representation;  that 
is,  the  signal  is  uniformly  sampled  at  or  above  the  Nyquist  rate.  The  hardware  brick  wall  is 
that  conventional  analog-to-digital  conversion  technology  is  currently  limited  to  sample  rates  on 
the  order  of  1GHz,  and  hardware  implementations  of  high  precision  Shannon-based  conversion 
at  substantially  higher  rates  seem  out  of  sight  for  decades  to  come.  This  is  where  the  theory  of 
Compressive  Sensing  becomes  relevant. 

Whereas  it  may  not  be  possible  to  digitize  an  analog  signal  at  a  very  high  rate,  it  may  be 
quite  possible  to  change  its  polarity  at  a  high  rate.  The  idea  is  then  to  multiply  the  signal  by  a 
pseudo-random  sequence  of  plus  and  minus  ones,  integrate  the  product  over  time  windows,  and 
digitize  the  integral  at  the  end  of  each  time  interval.  This  is  a  parallel  architecture  and  one  has 
several  of  these  random  multiplier-integrator  pairs  running  in  parallel  using  distinct  or  event  nearly 
independent  pseudo-random  sign  sequences. 

To  show  the  promise  of  this  approach,  we  take  xq  to  be  a  1-D  signal  of  length  n  =  512  which 
is  a  superposition  of  two  modulated  pulses  (see  Figure  11(a)).  From  this  signal,  we  collect  m  =  30 
measurements  using  an  m  x  n  matrix  <F  populated  with  i.i.d.  Bernoulli  ±1  entries.  This  is  an 
unreasonably  small  amount  of  data  corresponding  to  an  undersampling  factor  exceeding  17.  For 
reconstruction  we  consider  a  time-frequency  Gabor  dictionary  that  consists  of  a  variety  of  sine 
waves  modulated  by  Gaussian  windows,  with  different  locations  and  scales.  For  each  location  and 
scale,  the  corresponding  set  of  Gabor  coefficients  may  be  computed  simply  by  multiplying  the  signal 
by  the  appropriate  Gaussian  window  and  then  applying  a  zero-padded  FFT.  (The  transpose  of  this 
operation  simply  requires  an  inverse  FFT,  truncation,  and  multiplication  by  the  same  windowing 
function.)  Overall  the  dictionary  is  approximately  43x  overcomplete  and  does  not  contain  the  two 
pulses  that  comprise  xq. 

Figure  11(b)  shows  the  result  of  minimizing  £\  synthesis  (17)  in  this  redundant  dictionary. 
The  reconstruction  shows  pronounced  artifacts  and  ||xo  —  ^H^/IMIu  ~  0.67.  These  artifacts  are 
somewhat  reduced  by  analysis-based  £\  recovery  (18),  as  demonstrated  in  Figure  11(c);  here,  see 
||®o  —  ~  0.46.  However,  reweighting  the  £\  analysis  problem  offers  a  very  substantial 

improvement.  Figure  11(d)  shows  the  result  after  four  iterations;  ||xo  —  H^/IMI^  is  now  about 
0.022.  Further,  Table  2  shows  the  relative  reconstruction  error  ||xo  —  x^  ||r2/||x'lk2  as  a  function  of 
the  iteration  count  £.  Massive  gains  are  achieved  after  just  4  iterations. 
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Figure  11:  (a)  Original  two-pulse  signal  (blue)  and  reconstructions  (red)  via  (b)  t\  synthesis, 
(c)  t\  analysis,  (d)  reweighted  t\  analysis. 


Iteration  count  l 

0 

1 

2 

3 

4 

5 

6 

7 

Error  ||z0  ~  Ik/IMk 

0.460 

0.101 

0.038 

0.024 

0.022 

0.022 

0.022 

0.022 

Table  2:  Relative  i 2  reconstruction  error  as  a  function  of  reweighting  iteration  for  two-pulse 
signal  reconstruction. 
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4.2  Frequency  sampling  of  biomedical  images 

Compressive  Sensing  can  help  reduce  the  scan  time  in  Magnetic  Resonance  Imaging  (MRI)  and 
offer  sharper  images  of  living  tissues.  This  is  especially  important  because  time  consuming  MRI 
scans  have  traditionally  limited  the  use  of  this  sensing  modality  in  important  applications.  Simply 
put,  faster  imaging  here  means  novel  applications.  In  MR,  one  collects  information  about  an  object 
by  measuring  its  Fourier  coefficients  and  faster  acquisition  here  means  fewer  measurements. 

We  mimic  an  MR  experiment  by  taking  our  unknown  image  xo  to  be  the  n  =  256  x  256  =  65536 
pixel  MR  angiogram  image  shown  in  Figure  12(a).  We  sample  the  image  along  80  lines  in  the  Fourier 
domain  (see  Figure  12(b)),  effectively  taking  m  =  18737  real-valued  measurements  y  =  4>xo.  In 
plain  terms,  we  undersample  by  a  factor  of  about  3. 

Figure  12(c)  shows  the  minimum  energy  reconstruction  which  solves 

min  |jx||^2  subject  to  y  =  <3?x.  (19) 

Figure  12(d)  shows  the  result  of  TV  minimization.  The  minimum  ^-analysis  (18)  solution  where 
T  is  a  three-scale  redundant  D4  wavelet  dictionary  that  is  10  times  overcomplete,  is  shown  on 
Figure  12(e).  Figure  12(f)  shows  the  result  of  reweighting  the  i\  analysis  with  £max  =  4  and  e 
set  to  100.  For  a  point  of  comparison,  the  maximum  wavelet  coefficient  has  amplitude  4020,  and 
approximately  108000  coefficients  (out  of  655360)  have  amplitude  greater  than  100. 

We  can  reinterpret  these  results  by  comparing  the  reconstruction  quality  to  the  best  fc-term 
approximation  to  the  image  xo  in  a  nonredundant  wavelet  dictionary.  For  example,  an  I2  re¬ 
construction  error  equivalent  to  the  I2  reconstruction  of  Figure  12(c)  would  require  keeping  the 
k  =  1905  ~  m/9.84  largest  wavelet  coefficients  from  the  orthogonal  wavelet  transform  of  our  test 
image.  In  this  sense,  the  requisite  oversampling  factor  can  be  thought  of  as  being  9.84.  Of  course 
this  can  be  substantially  improved  by  encouraging  sparsity,  and  the  factor  is  reduced  to  3.33  using 
TV  minimization,  to  3.25  using  t\  analysis,  and  to  3.01  using  reweighted  t\  analysis. 

We  would  like  to  be  clear  about  what  this  means.  Consider  the  image  in  Figure  12(a)  and 
its  best  fc-term  wavelet  approximation  with  k  =  6225;  that  is,  the  approximation  obtained  by 
computing  all  the  D4  wavelet  coefficients  and  retaining  the  k  largest  in  the  expansion  of  the  object 
(and  throwing  out  the  others).  Then  we  have  shown  that  the  image  obtained  by  measuring  3 k 
real-valued  Fourier  measurements  and  solving  the  iterative  reweighted  t\  analysis  has  just  about 
the  same  accuracy.  That  is,  the  oversampling  factor  needed  to  obtain  an  image  of  the  same  quality 
as  if  one  knew  ahead  of  time  the  locations  of  the  k  most  significant  pieces  of  information  and  their 
value,  is  just  3. 

5  Discussion 

In  summary,  reweighted  t\  minimization  outperforms  plain  t\  minimization  in  a  variety  of  setups. 
Therefore,  this  technique  might  be  of  interest  to  researchers  in  the  field  of  Compressive  Sensing 
and/or  statistical  estimation  as  it  might  help  to  improve  the  quality  of  reconstructions  and/or 
estimations.  Further,  this  technique  is  easy  to  deploy  as  (1)  it  can  be  built  on  top  of  existing  i\ 
solvers  and  (2)  the  number  of  iterations  is  typically  very  low  so  that  the  additional  computational 
cost  is  not  prohibitive.  We  conclude  this  paper  by  discussing  related  work  and  possible  future 
directions. 
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Figure  12:  (a)  Original  MR  angiogram,  (b)  Fourier  sampling  pattern,  (c)  Backprojection, 
PSNR  =  29.00dB.  (d)  Minimum  TV  reconstruction,  PSNR  =  34.23dB.  (e)  l\  analysis  recon¬ 
struction,  PSNR  =  34.37dB.  (f)  Reweighted  t\  analysis  reconstruction,  PSNR  =  34.78dB. 
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5.1  Related  work 


Whereas  we  have  focused  on  modifying  the  £\  norm,  a  number  of  algorithms  been  have  proposed  that 
involve  successively  reweighting  alternative  penalty  functions.  In  addition  to  IRLS  (see  Section  2.5), 
several  such  algorithms  deserve  mention. 

Gorodnitsky  and  Rao  [55]  propose  FOCUSS  as  an  iterative  method  for  finding  sparse  solutions 
to  underdetermined  systems.  At  each  iteration,  FOCUSS  solves  a  reweighted  £ 2  minimization  with 
weights 


(£) 

w „■  = 


„(<-!) 


(20) 


for  i  =  1,2 , ,11.  For  nonzero  signal  coefficients,  it  is  shown  that  each  step  of  FOCUSS  is  equivalent 
to  a  step  of  the  modified  Newton’s  method  for  minimizing  the  function 


^2  lo£  lu 

i:Xi^0 


subject  to  y  =  <hx.  As  the  iterations  proceed,  it  is  suggested  to  identify  those  coefficients  apparently 
converging  to  zero,  remove  them  from  subsequent  iterations,  and  constrain  them  instead  to  be 
identically  zero. 

In  a  small  series  of  experiments,  we  have  observed  that  reweighted  £\  minimization  recovers 
sparse  signals  with  lower  error  (or  from  fewer  measurements)  than  the  FOCUSS  algorithm.  We 
attribute  this  fact,  for  one,  to  the  natural  tendency  of  unweighted  l\  minimization  to  encourage 
sparsity  (while  unweighted  i 2  minimization  does  not).  We  have  also  experimented  with  an  e- 
regularization  to  the  reweighting  function  (20)  that  is  analogous  to  (7).  However  we  have  found 
that  this  formulation  fails  to  encourage  strictly  sparse  solutions.  It  must  be  noted,  however,  that 
since  the  initial  submission  of  our  manuscript,  we  have  learned  of  ongoing  work  in  developing 
reweighting  strategies  for  FOCUSS  as  well  [56,57];  in  some  situations  the  performance  can  be 
competitive  with  reweighted  i\  minimization,  and  a  more  comprehensive  study  is  warranted. 

Harikumar  and  Bresler  [58]  propose  an  iterative  algorithm  that  can  be  viewed  as  a  generalization 
of  FOCUSS.  At  each  stage,  the  algorithm  solves  a  convex  optimization  problem  with  a  reweighted 
£2  cost  function  that  encourages  sparse  solutions.  The  algorithm  allows  for  different  reweighting 
rules;  for  a  given  choice  of  reweighting  rule,  the  algorithm  converges  to  a  local  minimum  of  some 
concave  objective  function  (analogous  to  the  log-sum  penalty  function  in  (8)).  These  methods  build 
upon  £2  minimization  rather  than  l\  minimization. 

Delaney  and  Bresler  [59]  also  propose  a  general  algorithm  for  minimizing  functionals  having 
concave  regularization  penalties,  again  by  solving  a  sequence  of  reweighted  convex  optimization 
problems  (though  not  necessarily  £ 2  problems)  with  weights  that  decrease  as  a  function  of  the  prior 
estimate.  With  the  particular  choice  of  a  log-sum  regularization  penalty,  the  algorithm  resembles 
the  noise-aware  reweighted  l\  minimization  discussed  in  Section  3.3. 

Finally,  in  a  slightly  different  vein,  Chartrand  [52]  has  recently  proposed  an  iterative  algorithm 
to  minimize  the  concave  objective  |  xj  | ip  with  p  <  1.  (The  algorithm  alternates  between  gradient 
descent  and  projection  onto  the  constraint  set  y  =  <hx.)  While  a  global  optimum  cannot  be 
guaranteed,  experiments  suggests  that  a  local  minimum  may  be  found — when  initializing  with  the 
minimum  7 2  solution-  that  is  often  quite  sparse.  This  algorithm  seems  to  outperform  (Pi)  in 
a  number  of  instances  and  offers  further  support  for  the  utility  of  nonconvex  penalties  in  sparse 
signal  recovery.  There  is  also  some  theory  showing  that  just  like  for  £q  or  i\  minimization,  the  lp 
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minimizer  for  p  <  1  enjoys  nice  theoretical  guarantees,  perhaps  a  bit  better  than  those  available  for 
the  l\  minimizer  [52,60]  (the  latter  reference  is  posterior  to  the  present  paper  and  has  been  included 
in  the  revised  version).  We  anticipate  that  similar  theoretical  guarantees  would  be  available  for 
the  minimizer  of  the  log-sum  penalty  as  well.  The  problem  with  these  results  is  that  unlike  i\ 
minimization,  there  are  no  known  polynomial-time  algorithms  producing  the  iv  minimizer  or  the 
log-sum  minimizer.  To  reiterate,  a  major  advantage  of  reweighted  l\  minimization  in  this  thrust  is 
that  (1)  it  can  be  implemented  in  a  variety  of  settings  (see  Sections  3  and  4)  on  top  of  existing  and 
mature  linear  programming  solvers  and  (2)  it  typically  converges  in  very  few  steps.  The  log-sum 
penalty  is  also  more  £o-like  and  as  we  discuss  in  Section  2.4,  additional  concave  penalty  functions 
can  be  considered  simply  by  adapting  the  reweighting  rule. 

5.2  Future  directions 

In  light  of  the  promise  of  reweighted  t\  minimization,  it  seems  desirable  to  further  investigate  the 
properties  of  this  algorithm. 

•  Under  what  conditions  does  the  algorithm  converge?  That  is,  when  do  the  successive  iterates 

have  a  limit  x*-00^? 

•  As  shown  in  Section  2,  when  there  is  a  sparse  solution  and  the  reweighted  algorithm  finds 
it,  convergence  may  occur  in  just  very  few  steps.  It  would  be  of  interest  to  understand  this 
phenomenon  more  precisely. 

•  What  are  smart  and  robust  rules  for  selecting  the  parameter  e?  That  is,  rules  that  would 
automatically  adapt  to  the  dynamic  range  and  the  sparsity  of  the  object  under  study  as  to 
ensure  reliable  performance  across  a  broad  array  of  signals.  Of  interest  are  ways  of  updating 
e  as  the  algorithm  progresses  towards  a  solution.  Of  course,  e  does  not  need  to  be  uniform 
across  all  coordinates.  As  demonstrated  in  [61],  some  possible  guidance  in  these  directions 
may  come  from  studying  connections  in  robust  statistics. 

•  We  mentioned  the  use  of  other  functionals  and  reweighting  rules.  How  do  they  compare? 

•  Finally,  any  result  quantifying  the  improvement  of  the  reweighted  algorithm  for  special  classes 
of  sparse  or  nearly  sparse  signals  would  be  significant. 
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